AITopics

2605.26271

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Machine LearningMay-20-2026

Factor Augmented High-Dimensional SGD

Li, Shubo, Han, Yuefeng, Yu, Xiufan

Stochastic gradient descent (SGD) has been a cornerstone of machine learning since the pioneering work of Robbins & Monro (1951). Beyond its algorithmic simplicity and scalability, SGD has also become a central object of theoretical study, with refined analyses linking its dynamics to implicit regularization, generalization performance, and algorithmic stability. For decades, theoretical analyses of SGD have largely resided within the realm of classical stochastic approximation (Polyak & Juditsky, 1992; Lai, 2003; Bottou et al., 2018), where the data dimension is considered fixed while the sample size tends to infinity. While this regime has yielded foundational insights, it no longer fully reflects the characteristics of modern learning systems. Contemporary applications often operate in regimes where data dimension, sample size, and model complexity grow together, calling for new theoretical tools and perspectives that go beyond traditional asymptotic analyses. In this study, we focus on the learning tasks involving high-dimensional predictors. When SGD is applied directly to such data, the dimensionality of the feature space propagates into the optimization process, resulting in a highdimensional (HD) parameter space. Algorithmically, one trending strategy is to approximate the gradient updates using a low-rank representation to reduce memory costs and accelerate computation (Wang et al., 2018; Vogels et al., 2019; Kozak et al., 2019; Kasiviswanathan, 2021; Zhao et al., 2024). Theoretically, despite the vast literature on SGD, convergence guarantees of HD-SGD remain limited (Garrigos & Gower, 2023; Li et al., 2025).

artificial intelligence, factor model, machine learning, (16 more...)

2605.19291

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.90)

Neural Information Processing SystemsApr-24-2026, 22:25:32 GMT

Checklist

A.2: Comparison of the causal assumptions A.3: Comparison of allowed temporal covariates A.4: Unrelated works with similar terminology The SyncTwin algorithm. A.5: The generality of SyncTwin's assumed DGP A.6: Estimation for control and new individuals A.7: Algorithmic details and pseudocode A.8: Optimization for the matching loss Lm Simulation study.

artificial intelligence, machine learning, modeling & simulation, (19 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Machine LearningApr-2-2026

Tucker Diffusion Model for High-dimensional Tensor Generation

Guo, Jianhua, Kong, Xinbing, Li, Zeyu, Mao, Junfan

Statistical inference on large-dimensional tensor data has been extensively studied in the literature and widely used in economics, biology, machine learning, and other fields, but how to generate a structured tensor with a target distribution is still a new problem. As profound AI generators, diffusion models have achieved remarkable success in learning complex distributions. However, their extension to generating multi-linear tensor-valued observations remains underexplored. In this work, we propose a novel Tucker diffusion model for learning high-dimensional tensor distributions. We show that the score function admits a structured decomposition under the low Tucker rank assumption, allowing it to be both accurately approximated and efficiently estimated using a carefully tailored tensor-shaped architecture named Tucker-Unet. Furthermore, the distribution of generated tensors, induced by the estimated score function, converges to the true data distribution at a rate depending on the maximum of tensor mode dimensions, thereby offering a clear theoretical advantage over the naive vectorized approach, which has a product dependence. Empirically, compared to existing approaches, the Tucker diffusion model demonstrates strong practical potential in synthetic and real-world tensor generation tasks, achieving comparable and sometimes even superior statistical performance with significantly reduced training and sampling costs.

artificial intelligence, diffusion model, machine learning, (18 more...)

2604.00481

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Lee, Inbeom, Jin, Tongtong, Aragam, Bryon

Beyond identifiability: Learning causal representations with few environments and finite samples

arXiv.org Machine LearningMar-30-2026

We provide explicit, finite-sample guarantees for learning causal representations from data with a sublinear number of environments. Causal representation learning seeks to provide a rigourous foundation for the general representation learning problem by bridging causal models with latent factor models in order to learn interpretable representations with causal semantics. Despite a blossoming theory of identifiability in causal representation learning, estimation and finite-sample bounds are less well understood. We show that causal representations can be learned with only a logarithmic number of unknown, multi-node interventions, and that the intervention targets need not be carefully designed in advance. Through a careful perturbation analysis, we provide a new analysis of this problem that guarantees consistent recovery of (a) the latent causal graph, (b) the mixing matrix and representations, and (c) \emph{unknown} intervention targets.

artificial intelligence, machine learning, representation, (15 more...)

2603.25796

Country:

Asia > Japan > Honshū > Tōhoku > Iwate Prefecture > Morioka (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-13-2026, 22:20:21 GMT

Modeling Dynamic Functional Connectivity with Latent Factor Gaussian Processes

Lingge Li, Dustin Pluta, Babak Shahbaba, Norbert Fortin, Hernando Ombao, Pierre Baldi

Neural Information Processing Systems http://nips.cc/

covariance, matrix, time sery, (14 more...)

Country:

North America > United States > Wisconsin (0.04)
North America > Canada (0.04)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)
Information Technology > Modeling & Simulation (0.85)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Neural Information Processing SystemsFeb-11-2026, 10:06:39 GMT

Structuring UncertaintyforFine-GrainedSampling inStochasticSegmentationNetworks

Weobtain them directly from the low-rank Gaussian distribution for the logits in the network head of SSNs, based on a previously unconsidered view of this distribution as a factor model.

artificial intelligence, machine learning, segmentation, (19 more...)

Country: Europe > Germany (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Vision (0.68)

Neural Information Processing SystemsFeb-7-2026, 22:14:27 GMT

2a27b8144ac02f67687f76782a3b5d8f-AuthorFeedback.pdf

algorithm, baseline, validation, (17 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.56)

Wu, Skyler, Nair, Yash, Candès, Emmanuel J.

Efficient Evaluation of LLM Performance with Statistical Guarantees

arXiv.org Machine LearningJan-30-2026

Exhaustively evaluating many large language models (LLMs) on a large suite of benchmarks is expensive. We cast benchmarking as finite-population inference and, under a fixed query budget, seek tight confidence intervals (CIs) for model accuracy with valid frequentist coverage. We propose Factorized Active Querying (FAQ), which (a) leverages historical information through a Bayesian factor model; (b) adaptively selects questions using a hybrid variance-reduction/active-learning sampling policy; and (c) maintains validity through Proactive Active Inference -- a finite-population extension of active inference (Zrnic & Candès, 2024) that enables direct question selection while preserving coverage. With negligible overhead cost, FAQ delivers up to $5\times$ effective sample size gains over strong baselines on two benchmark suites, across varying historical-data missingness levels: this means that it matches the CI width of uniform sampling while using up to $5\times$ fewer queries. We release our source code and our curated datasets to support reproducible evaluation and future research.

efficient evaluation, large language model, machine learning, (19 more...)

2601.20251

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)